Data Augmentation Effects on Highly Imbalanced EEG Datasets for Automatic Detection of Photoparoxysmal Responses

Photosensitivity is a neurological disorder in which a person’s brain produces epileptic discharges, known as Photoparoxysmal Responses (PPRs), when it receives certain visual stimuli. The current standardized diagnosis process used in hospitals consists of submitting the subject to the Intermittent Photic Stimulation process and attempting to trigger these phenomena. The brain activity is measured by an Electroencephalogram (EEG), and the clinical specialists manually look for the PPRs that were provoked during the session. Due to the nature of this disorder, long EEG recordings may contain very few PPR segments, meaning that a highly imbalanced dataset is available. To tackle this problem, this research focused on applying Data Augmentation (DA) to create synthetic PPR segments from the real ones, improving the balance of the dataset and, thus, the global performance of the Machine Learning techniques applied for automatic PPR detection. K-Nearest Neighbors and a One-Hidden-Dense-Layer Neural Network were employed to evaluate the performance of this DA stage. The results showed that DA is able to improve the models, making them more robust and more able to generalize. A comparison with the results obtained from a previous experiment also showed a performance improvement of around 20% for the Accuracy and Specificity measurements without Sensitivity suffering any losses. This project is currently being carried out with subjects at Burgos University Hospital, Spain.


Introduction
Photosensitivity is a neurological condition in which the patient's brain responds abnormally to certain visual stimuli, such as light reflections, flashing lights, or intermittent patterns. These anomalous responses are called Photoparoxysmal Responses (PPRs) and take the form of different types of electrical epileptic discharges, which can differ in intensity or spread throughout the brain. In this respect, the study published in [1] proposed a taxonomy of four different PPRs types: • Type-1: spikes in the occipital region. • Type-2: spikes followed by a biphasic slow wave in the occipital and parietal regions. • Type-3: spikes followed by a biphasic slow wave in the occipital and parietal regions, spreading to the frontal regions. • Type-4: generalized poly-spikes and waves.
The higher the type's number, the more dangerous the photosensitivity is, with the Type-4 PPR corresponding to the most-severe case. Generally, PPRs are epilepsy manifestations that can result in generalized seizures; however, only about 6% of epileptic people suffer from photosensitivity and PPRs [2] and their causal relationship with the abrupt changes in visual stimulation characteristic of video games [3], advertisements [4], etc. The exposure of photosensitive and epilepsy patients to this kind of digital content and environments has become a major concern with the proliferation of video games and virtual reality technologies in recent years [5].
Diagnosing a patient's photosensitivity follows a standardized clinical procedure known as Intermittent Photic Stimulation (IPS) [6][7][8][9]. This procedure alternates resting and stimulating periods of time; during a stimulation period, a white flashing light switches on and off at a given frequency. The flashing frequency gradually increases from a minimum to a maximum and, then, gradually decreases back to the minimum again. Whenever a PPR is detected, the current increasing or decreasing process is halted to avoid any onsets, diagnosing the patient with photosensitivity and determining the photosensitivity frequency limits. The neurophysiologists detect the PPRs using the Electroencephalogram (EEG) signals from the patient, monitoring these signals during the whole procedure. Figure 1 shows an example of two PPRs obtained while flashing a patient. This procedure has two main drawbacks: On the one hand, whether the white flashing stimulation is enough to detect all the types of PPR is arguable [10]. On the other hand, the amount of time required for a complete analysis of the EEG recordings is too large, reducing the efficiency of the whole procedure and the performance of the neurophysiologist unit. Furthermore, the EEG recording in Figure 1 shows the impossibility of labeling a PPR as belonging to one single type because all the PPR types usually merge within a single PPR event. Moreover, external variables, such as medical treatment, sleep quality, or even the time of the day, may introduce variations in the EEG signals for a PPR event.
As a result, the EEG recordings from IPS sessions show several issues. Firstly, the morphology and features of the brain activity (including PPRs) may significantly change between different patients and even between different EEG sessions of the same patient due to his/her ongoing condition [11]. It is well known that the resting brain activity of the patients also presents remarkable differences according to their ongoing conditions, introducing noise in the process and difficulties in PPR detection. Finally, obtaining representative data from PPRs becomes a challenging task not only because of the small percentage of the affected population [2,12,13], but also because the number of PPRs within a single IPS EEG recording is fairly small as long as the procedure stops once a PPR is found. As a consequence, gathering a suitable dataset for training and testing Artificial Intelligence (AI) and Machine Learning (ML) models is always in compromise: in the normal scenario, these models deal with an extremely unbalanced dataset.
To our knowledge, the literature concerning automatic PPR detection only includes a handful of studies, showing that this problem has not received the focus of the research community. For instance, there are authors that claim that using the IPS procedure is not suitable for PPR detection, proposing using flashing at high frequencies instead [10]. The study in [14] discriminated between normal or PPR stimulation regions by means of the aggregation of the Fourier components calculated on sliding windows, whose lengths varied according to the flashing frequency. Besides, the studies in [15,16] were the only ones that proposed ML for automatic PPR detection. There could be room for transfer learning from research in closely related fields, such as those focused on generalized epilepsy seizure detection. We can cite some examples, such as the research in [17], which proposed using Extreme Gradient Boosting for seizure classification, or the study in [18], which made use of the fluctuation of the EEG channel higher and lower frequencies; the use of the Permutation Rényi Entropy for differentiating interictal and ictal states was proposed in [19]. Simple ML techniques such as the Artificial Neural Network or K-Nearest Neighbors algorithm were also applied in the automatic detection of ictal discharges and inter-ictal states [20], while a more complex Deep Learning (DL) method such as a channelindependent Long Short-Term Memory Network was proposed in [21]. Other studies made use of different clinical equipment, bio-markers, and biomedical measures for the same purpose, such as Electrocardiography (ECG) [22][23][24], ECG combined with EEG data [25], electromyography [26,27], or magnetoencephalography [28]. The existence of such recent studies on a subject that has been so widely studied over several decades clearly shows that the detection and prediction of epileptic seizures are far from being solved [29].
Finally, the dataset imbalance problem can be solved by creating synthetic data from the real dataset using Data Augmentation (DA) algorithms. Since recording and finding the most-suitable clinical EEG recordings from real patients can take much time and effort by neurophysiologists, DA represents a very useful technique that grows the dataset with realistically and artificially created EEG PPR instances resembling the real ones; therefore, DA aims to balance the dataset and, thus, the capacity of obtaining competitive models.
DA has always been more applied to image processing applications, but currently, it is gaining notoriety in Time-Series applications, where it includes operations from very simple ones-such as jittering, warping, or slicing-to more advanced techniques such as the application of Generative Networks [30]. It has been widely applied in classification tasks with DL models, such as forecasting problems [31,32] or medical issues, such as the monitoring of medication tampering in [33] using multivariate time signals, the introduction of a DA stage in the training of an LSTM-based DL model applied to fall detection in [34], or the improving of ECG classification in [35].
Due to the complex patterns present in EEG signals, DL algorithms often perform better in their analysis because they can learn such patterns in greater depth, but a largescale balanced dataset is needed, so DA is a fairly common solution to improve a clinical dataset: [36,37] collected many recent studies that applied some kind of DA process to improve their DL performance. Even for the same task, different DA approaches can be used to create the synthetic data as in the case of emotion recognition, where [38] decided to apply Convolutional Neural Networks (CNNs), while [39] used Deep Generative models. The work of [40] applied and compared the effect of six different DA techniques on the classification using a CNN from the most-simple ones (averaging or segment recombination) to more refined ones such as autoencoders. A DL neural network that includes a combination of Data Augmentation and Domain Adaptation modules was proposed in [41] to improve the accuracy of single-channel EEG classification.
In this study, we focused on reducing the effects of the imbalanced nature of the dataset. To do so, we suggest the use of a very specific Cross-Validation stage complemented by an additional DA step. While the designed Cross-Validation scheme tries to reduce the percentage of training samples from the majority class, the DA step aims to balance the training dataset by generating artificial PPR instances by mixing EEG windows belonging to the minority class. To evaluate the performance of these improvements, we propose using the two-best PPR detection algorithms from our previous experiments: a two-class K Nearest Neighbor classifier and a Dense-Layer Neural Network.
The structure of this paper is as follows: The next subsection gives some context to this research. Section 2 focuses on the detailed description of the materials and methodologies applied in the study. Section 3 presents the results of the experimentation and their analysis. The last section contains the conclusions of this research.

Virtual Reality and Artificial Intelligence for PPR Detection
This study is part of a larger project that proposes the introduction of Virtual Reality (VR) and AI technology for the photosensitivity diagnosis and evaluation process, as a supporting tool for neurophysiological specialists [15]. We call the integration of these two fields virtual-reality-enhanced Artificial Intelligence (vAI), with VR offering the possibility of recreating real-world scenarios in a virtual environment, while AI algorithms are able to perform real-time analysis of EEG recordings, automatically detecting the abnormal brain activity, such as PPRs, reducing the time required by the neurophysiologists for this task. In Figure 2, a scheme of how the IPS system works currently versus how it would work with vAI is shown. New protocols can be designed using VR, including the development of training tools for the patients to learn how to distinguish dangerous scenarios. AI is responsible for assessing the level of stimulus to keep the patient safe. Figure 2. vAI4Neuron project overview. EEG or magnetoencephalograms can be used for monitoring the brain activity. VR is in charge of the stimulus, while AI assists the experts in the decision. Figure 3 shows the sequence of stages in this study; this section describes each of these stages. Firstly, Section 2.1 describes the dataset used in this study, giving the details on the data recording process. Secondly, Section 2.2 focuses on the designed DA approach for dataset balancing. Afterward, Section 2.3 introduces the preprocessing of the EEG signals, the feature extraction, and the dimensional reduction stages. The Cross-Validation stage introduced in this research is thoroughly defined and explained in Section 2.4. Finally, Section 2.4 focuses on the ML modeling techniques for classifying each EEG window as a PPR or a normal window. Workflow of this research. The original EEG data from the dataset are windowed and labeled. The Cross-Validation introduces a subsampling for reducing the number of non-PPR windows, while DA takes place to balance the number of PPR windows in the training/testing dataset. Window preprocessing, feature extraction, and dimensional reduction are applied to each EEG window. Finally, the process ends with the training and testing of the models.

Dataset
The clinical neurophysiologists from the Neurophysiology Service at Burgos University Hospital recorded and annotated the dataset used in this research. This dataset included data from 10 patients diagnosed with different degrees of photosensitivity. Each patient went through an IPS session, recording the EEG channels. The equipment in the facilities included a Natus Nicolet v44 for EEG recording and a Natus Neuroworks9 for real-time EEG visualization. The data were anonymized, so no extra information-such as gender or sex-was kept, following the hospital's privacy protocol for the essay.
Each session consisted of a 3-to 5-min continuous recording using an EEG cap while the patient was stimulated by applying the first half of the IPS procedure, corresponding to the ascending standard frequencies from 1 Hz up to 50 Hz. The sampling rate for the EEG signals was 500 Hz; placing up to 19 electrodes according to the 10-20 standardized system [42] (see Figure 4). The EEG recordings were analyzed considering the EEG average montage, which is the one that clinical neurophysiologists use on a daily basis. This average montage computes the global average of the values among all the EEG channels; then, each sample is modified by subtracting this calculated global average value. One-second intervals are marked in this montage, representing the standard time slot used in the EEG analysis. To annotate the EEG recordings, the clinical specialists visually analyzed all raw EEG recordings, marking the starting and ending points of any triggered PPR during the stimulation. PPR phenomena can present a very variable duration (e.g., one may be triggered for only one-tenth of a second, while the next can last up to five seconds straight). As mentioned before, the montage has one-second intervals to serve as a guide for the clinical specialists, so the EEG recordings were split using a 1-second length sliding window with 90% overlapping. Then, each window was manually labeled as PPR or non-PPR according to whether or not it included part of a PPR interval. Before applying Data Augmentation, the balance of the original dataset was as follows: • Total number of EEG windows: 29,190: -Number of non-PPR windows: 27,968 (95.81%).
As can be observed, the dataset showed an extreme imbalance, negatively affecting the learning capacity of the models. This research proposes a DA strategy to tackle this problem, as explained in the next section.

The Data Augmentation Strategy
The DA stage aimed to generate realistic new PPR windows that clearly resemble actual PPR ones. Using the DA stage increases the number of PPR windows, introducing more representativeness to the training dataset. For the purposes of this research, we define an ad hoc method for DA that merges two actual PPR windows. The idea is to split the selected windows into n intervals, generating a new PPR window by collecting alternating intervals from each parent, a similar approach to the recombination method applied in [40]. Nevertheless, we must consider when and where a PPR appears within a PPR window; four different possibilities arise: (i) the PPR window contains the starting part of a PPR event; (ii) the PPR window contains the final part of a PPR event; (iii) the PPR window represents part of a PPR event; (iv) the PPR window contains the whole PPR event (see Figure 5, which depicts these four cases). The DA stage must merge PPR windows from the same case to produce better and more realistic synthetic windows. Otherwise, the obtained EEG segments may become meaningless from a neurophysiological point of view. Additionally, the DA also balances the dataset in terms of the number of PPR windows from each case, so the total number of new instances is the same for each of them.
Therefore, in creating a new realistic synthetic PPR window C, the DA first selects the group to extend (one of the four possibilities mentioned before), choosing two random actual PPR windows A and B from the dataset. Besides, the number of cut-off points n is defined in advance, dividing both A and B windows into n + 1 segments of the same length (in this study, we kept n = 3, so windows A and B were cut by three cut-off points located every 125 samples (at Samples 125, 250, and 375), thus generating 4 segments of 125 samples in length each).
The new synthetic PPR window C is then compounded with alternating segments, one from each parent A and B, as shown in Figure 6: starting with the first segment from A, then the second segment from B is added, then the third segment from A again, and so on. Equation (1) shows the general sequence to produce the synthetic PPR window from its parents for the current n parameter.
Besides, abrupt EEG signal changes need to be avoided when merging two PPR windows because they do not occur; these abrupt changes may be due to the high difference between the ending value from one segment and the starting value in the next segment arranged in a sequence. In this study, the use of interpolation around the cut-point helps in keeping the signal's continuity, providing smooth transitions among segments and avoiding these sudden value changes. Equation (2) shows the proposal for the interpolation, where x is the cut-point and when the left segment is from A and the right segment from B; obviously, the formula would be implemented in a reverse way. Furthermore, Figure 6 illustrates the generation and smoothing of the changes; when more than one EEG channel is considered, the same cut-points are used among all the channels to keep coherency.
(2) Figure 6. Example of the DA technique. A newly generated synthetic realistic PPR window (green signal) using two fully covered windows: A (blue) and B (orange). Dotted lines represent the cut-points where the signals were split.
Interestingly, the new synthetic EEG windows have a very similar spectrogram to that of the parents. Figure 7 shows the spectrograms of a DA generation, with two random parents from one of the groups and the synthetic sibling. Some spectrograms' changes appear in the range from 50 to 100 Hz, although the main distribution of the signals is similar in both the parents and the siblings. We assumed that the observed changes are not meaningful, which means that the cut points and the interpolation proposed in this research do not introduce any disturbances to the TS. Nevertheless, the subsequent pre-processing stages filter these differences.

Preprocessing and Dimensional Reduction
After applying DA to the dataset, preprocessing, feature extraction, and dimensional reduction take place (see Figure 8). Despite recording 19 EEG channels, neurophysiologists only consider five of them to detect PPRs: Fz, F2, F4, O1, and O2. In a previous study [15], we analyzed this subset of channels and concluded that Fz was the most-plausible channel for PPR detection if only one channel were to be selected; therefore, for this study, we used only this Fz channel. Future work will consider using more EEG channels for PPR detection to improve the performance of the models. The preprocessing of an EEG window includes removing the average and applying a Notch filter at 50 Hz plus a band-pass filter in the range of 1 to 50 Hz. Figure 8. Workflow for the preprocessing and dimensional reduction stages. Firstly, the dataset was augmented, introducing new EEG windows. Afterwards, the preprocessing, feature extraction, and feature reduction stages were applied in a sequence on every EEG window (either being augmented or the original).
Up to 31 features from different domains were calculated using the library TSFEL [43]; Table 1 lists the domains and transformations considered, with the corresponding study where the transformation was originally defined. These features were scaled to the interval [0.0, 1.0], and Principal Component Analysis (PCA) [44] was performed; the PCA components representing up to the 95% of the variability in the data were preserved, leading to a new domain of 12 features. It is worth mentioning that we also performed dimensional reduction using Independent Component Analysis (ICA) and Locally Linear Embeddings (LLEs); however, the results obtained through PCA represent the best solution bed.

Temporal Domain
Sum of Absolute Values, Maximum Amplitude, Sum of Absolute Differences, Total Energy, Absolute Energy, Area Under the Curve, Entropy, Autocorrelation.

Cross-Validation Scheme
Applying DA is not enough to counterbalance the high imbalance character of the dataset: the percentage of synthetic PPR windows would be too high, biasing the learning process. Therefore, the number of non-PPR instances must also be reduced, so that the combined action of the two methods produces a balanced dataset. This reduction in the number of non-PPR windows should be carefully performed so that no bias is introduced in this subsampling process as well.
In a preliminary study, unsupervised learning was used to group the non-PPR windows, aimed to find some structure in the data. Unfortunately, independent of the clustering technique, the great majority of the non-PPR windows were grouped in the same big cluster. Instead, sub-sampling with replacement is proposed for the reduction in the number of non-PPR windows, as depicted in Figure 9. Thus, each non-PPR window's fold includes randomly selected non-PPR windows. Interestingly, a non-PPR window can be selected for more than one single fold: in this way, we tried to keep the variability and heterogeneity in all the different folds generated in the Cross-Validation process. Probabilistic sub-sampling with replacement could have been used as well, reducing the probability of a non-PPR window once chosen in a fold; however, for the sake of simplicity, we kept the normal sub-sampling with replacement.
Each final training fold includes a balanced number of PPR, either actual or synthetic, and non-PPR windows. Furthermore, two test datasets were generated: with and without DA-generated windows. These two test sets were created for comparison purposes of the models' performances using only real data and a combination of real data and the DA instances. This performance comparison would help in understanding how the DA stage affects each of the models. Figure 9 shows the procedure for the generation of the training and test sets, following the next steps: 1.
There were 3000 non-PPR and 500 PPR windows randomly selected.

2.
There were 2500 synthetic PPR windows created from the 500 selected, up to a total of 3000 PPR instances. 3.
The training set was formed by these 6000 instances with a perfect balance of 50%.

4.
Two test sets were formed: • The first one (Test 1) was made from the rest of the non-PPR and only the real PPR windows. • The test synthetic PPR windows were created up until another 3000 PPR instances again. • The second test set (Test 2) was created from the rest of the non-PPR and the synthetic PPR windows. . Cross-Validation data generation for each repetition. The training set includes 500 PPRw plus 2500 data-augmented PPRw (daPPRw) plus 3000 non-PPR windows (non-PPRw); PPRw and non-PPRw were subsampled from the original dataset. Two test datasets (Test 1 and Test 2) were created: with or without DA. Test 1 included all the remaining instances from the original dataset (722 PPRw and 24,968 non-PPRw), while Test 2 also included (3000-722) daPPRw.

Modeling and Evaluation
Once the training and test sets were created, the training and the evaluation of the ML classifiers were performed. The algorithms selected for this study were the best techniques found in our preliminary research [16]: two-class K-Nearest Neighbors (2C-KNN) and a Neural Network with a Dense Layer as the hidden layer (DL-NN). We considered other ML models, such as SVM or Random Forests [15]; however, the performance of the 2C-KNN and the DL-NN were better than with other methods. For each model, different parameter values were also tested: for 2C-KNN, the K number of neighbors tested was {3, 5, 7, 9, 11, 13, 15}; for DL-NN, the N number of hidden neurons tested was {10, 20, 30, 40, 50}.
To measure the performance of the ML techniques for the PPR detection, the Accuracy (ACC), Sensitivity (SENS), and Specificity (SPEC) measurements were calculated according to Equations (3)-(5), respectively, where TP, TN, FP, and FN are the True Positive, True Negative, False Positive, and False Negative classification values, respectively; N data is the total number of instances in the dataset, from which N data_positive and N data_negative are the number of positive instances and the number of negative instances within the dataset. ACC measures the performance of a model when the data are balanced, while SENS and SPEC are more specific for unbalanced problems.
Furthermore, to compare more directly the effects of the different parameter values and find the most-optimal configuration, the Receiver Operating Characteristic (ROC) curves and their associated Area Under Curve (AUC) values were used. The ROC curve is a representation of the proportion between the True Positive Rate and the False Positive Rate of the classification, and each curve has its own AUC value in the range [0, 1]. When comparing models using the ROC curve, the closer the curve of the model passes through the perfect classification point [FPR = 0, TPR = 1], the higher the AUC value and the better the model are. Only ROC curves corresponding to the global performance of the system were computed.

Results and Discussion
Let us remember that the classification results were obtained after training each model with each training fold and evaluating with two different test sets: a test set created before applying DA (i.e., available real data; it is called Test 1) and the test formed after applying DA (i.e., simulating different cases that are not contemplated until now in the real data; it is called Test 2) (see Figure 9). We compared whether introducing Test 2 allowed us not only to improve the already tested classifiers, but also to obtain or to discriminate between models, so a better model selection can be performed.
The complete PPR detection performance results of both 2C-KNN and DL-NN techniques are shown in Tables 2 and 3, respectively: each table shows the results using Test 1 (without DA) in the upper half, while the lower half shows the results using Test 2 (with DA).    The best version of each classifier was the one with the parameter K fixed at three neighbors in the case of 2C-KNN and the one with the parameter n fixed at 10 hidden neurons for the DL-NN. Their performances are included in Table 4: the boxplots (see Figure 10) and ROC curves (see Figure 11 display a comparison between the performance results before and after the application of DA, allowing the analysis of the effects of the DA procedure on the results for both classifiers. The application of the DA step to increase the PPR instances and balance the dataset reduced the variance of the performance values, which means that it managed to make both classifiers more robust. As can be seen, the ACC and SPEC measurements reached very high values: around 95% for 2C-KNN, while those from the DL-NN were around 98%. In terms of SENS, 2C-KNN and the DL-NN reached values around 75% and 85%, respectively. By visually comparing these results, it is clear that the best PPR detection technique was the DL-NN method with 10 hidden neurons. Results depicted in the boxplots and tables show the DL-NN models outperforming the 2C-KNN. However, the differences are smaller when comparing the performance of each type of model with the two experimentation setups Test 1 and Test 2. For this comparison, we decided to use the SENS measurement because it is the performance metric with worse results (smaller values and higher dispersion), testing whether results from both experimentation setups belong to the same population with the same probability distribution. We need a hypothesis contrast test to determine whether or not the designed experimentation with DA allows for obtaining better models. We ran the Shapiro normality test and the Levene homogeneity test to determine if the sub-populations followed a normal distribution and if they had the same variance; according to our results, all the sub-populations were normally distributed and presented the same variance. In this case, the most-suitable hypothesis test is the parametric version of the Analysis Of Variance (ANOVA) test [51]; we ran this test with a significance value of 95%.  We applied the following ANOVA test: • To determine if all the 2C-KNN models obtained using Test 1 belong to the same population or not.

•
To determine if all the 2C-KNN models trained using Test 2 belong to the same population or not.

•
To determine if all the DL-NN models obtained using Test 1 belong to the same population or not. • To determine if all the DL-NN models trained using Test 2 belong to the same population or not.
Results from the ANOVA tests showed that the 2C-KNN models obtained using the Test 1 experimentation belonged to the same distribution, making it impossible to decide which parameter set is the best. On the other hand, the results from 2C-KNN using the Test 2 experimentation performed differently: the ANOVA test rejected the null hypothesis that all of them belong to the same population. Therefore, for the 2C-KNN, the Test 2 experimentation setup allowed us to choose a better model and parameter set. On the other hand, the ANOVA test did not reject the null hypothesis for the DL-NN models: the Test 2 experimentation setup did not differentiate the performance of the parameter subsets.
In this latter case of the DL-NN models, the performance did not improve independently of the amount of available data or the network size. No over-fitting was detected, so we can conclude that we reached the performance bound for the evaluated DL-NN models given the available data. More complex DL structures and, consequently, more representative data are needed to improve these results.
Unfortunately, we cannot compare against other approaches in the literature. As mentioned before, the single study found so far on PPR detection was our previous research shown in [16], which focused on Type-4 PPR detection; Figure 12 shows the obtained results from our previous study. A comparison between the results from the two pieces of research shows that the SENS measurement was similar in both cases; however, the ACC and SPEC measurements were much worse in [16] than in this research. This fact reinforces our hypothesis that detecting PPRs as a single unique label produces a better performance of the models. In the same way, there is a significant increase in the performance of the models using the experimental setup described in this research. Figure 12. Results from the Type-4 PPR detection experiment performed in [16]. On the left, the boxplot of the results for the 2C-KNN technique with K fixed at 9 neighbors. On the right, the boxplot of the results for the DL-NN method with the parameter n fixed at 20 hidden neurons.

Conclusions
For this study, our research team proposed incorporating a DA stage in a previously developed automatic PPR detection procedure [15,16] to solve our dataset imbalance problem. A Cross-Validation-based scheme with an additional DA step was applied in order to increase the number of PPR windows (the minority class, which occupies a total of 3% of the total number of instances in the dataset) and undersample the excess of non-PPR windows within each fold, thus balancing the training set.
Raw EEG signals were windowed and labeled before the DA technique was applied to artificially create new raw PPR windows that resemble the real ones as much as possible. Once the synthetic data were created, EEG windows were preprocessed and dimensionally reduced by extracting a total of 32 features from different domains and applying the PCA algorithm to reduce them into an even smaller set of 12 uncorrelated components. The final data were used for training and evaluating the ML models selected for the detection task: 2C-KNN and DL-NN.
This process was designed to detect all PPR types, contrary to our previous study, which was focused on the detection of only Type-4 PPRs [16]. DA allowed us to achieve better detection results compared to those previously obtained despite the incorporation of higher morphological variability due to their quite different waveforms, making the models more robust. Both methods performed very well, reaching average values of the ACC and SENS around 95% for 2C-KNN and above 98% for the DL-NN. However, SENS measurement showed that the DL-NN method was better than the 2C-KNN algorithm with average values of almost 85% compared to the average of 75% for 2C-KNN.
This detection performance was achieved by using simple ML techniques. For future work, DL models will start to be tested for this purpose due to their increased ability to learn the different complex PPR patterns more easily. Moreover, it is likely that, in the case of the DL-NN, integrating the DA step as an additional layer of the neural network and analyzing different data generator models can lead to better and more robust models. However, due to the high imbalance of the data, it may be convenient to approach the problem as a one-class problem oriented toward anomaly detection.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Written informed consent was obtained from the patient(s) to publish this paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: